Debugging Machine Learning Models
نویسنده
چکیده
Creating a machine learning solution for a real world problem often becomes an iterative process of training, evaluation and improvement where the best practices and generic solutions are few and far between. Our work presents a novel solution for an essential step of this cycle: the process of understanding the root causes of ’bugs’ (particularly consequential or confusing test errors) discovered during evaluation. Given an observed bug, our method aims to identify the training items most responsible for biasing the model towards creating this error. We develop a optimization based framework for generating this information which leads to our method not only having simple analytic solutions for certain learners but to it also being applicable to any supervised learner or data type.
منابع مشابه
Model-Agnostic Interpretability of Machine Learning
Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user interfaces. Thus, interpretability has become a vital concern in machine learning, and work in the area of interpretable models has found renewed interest. I...
متن کاملModeling and Debugging Engineering Decision Procedures with Machine Learning
This paper reports on the use of machine learning systems for modeling existing engineering decision procedures. In this activity, various models of an existing decision procedure are constructed by using diierent machine learning systems as well as by changing their operational parameters and input. Individual models serve to focus on diierent aspects of the decision procedure and their combin...
متن کاملInterpreting Complex Regression Models
Interpretation of a machine learning induced models is critical for feature engineering, debugging, and, arguably, compliance. Yet, best of breed machine learning models tend to be very complex. This paper presents a method for model interpretation which has the main benefit that the simple interpretations it provides are always grounded in actual sets of learning examples. The method is valida...
متن کاملSources of Variability in Large-scale Machine Learning Systems
We investigate sources of variability of a state-of-the-art distributed machine learning system for learning click and conversion prediction models for display advertising. We focus on three main sources of variability: asynchronous updates in the learning algorithm, downsampling of the data, and the non-deterministic order of examples received by each learning instance. We observe that some so...
متن کاملDebugging Machine Learning Tasks
Unlike traditional programs (such as operating systems or word processors) which have large amounts of code, machine learning tasks use programs with relatively small amounts of code (written in machine learning libraries), but voluminous amounts of data. Just like developers of traditional programs debug errors in their code, developers of machine learning tasks debug and fix errors in their d...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کامل